install.packages("grafzahl")
grafzahl::setup_grafzahl(cuda = grafzahl::detect_cuda())Workshop Automated Content Analysis
Session 5: Deep Learning
Introduction
In the last session, we worked with some pre-trained word embedding models and found out that they still come with a lot of problems. The next breakthrough in the development of embeddings were transformer models (Vaswani et al. 2017). They have several advantages which lead to models that
- Can take larger contexts into account when training (remember the limited window we used before)
- Can be trained much more efficiently and can hence take in even more texts
- Can have several embeddings for each word depending on the context, finally moving away from the bag-of-words paradigm
- Can be fine-tuned on new data which contains different vocabulary
However, compared to other approaches like naive bayes or svm algorithms, we are still relatively early for this new technology. The step that happened about 10-15 years ago when many of the things were implemented in R has not really happened yet. Meanwhile, the models also only run on new powerful hardware since the required matrix computations are slow on CPUs and need a GPU instead.
So this session is currently more a preview than an actual hands-on tutorial.
R wrappers for large language models
Another problem with LLMs is that they are predominanlty controlled from Python. R has excellent wrappers for languages like C, C++, Rust or Java, but Python still falls a little behind in terms of comfort of usage. Packages like spacyr and grafzahl try to employ Python anyway through the reticulate compatibility layer. (They do still have some issues to figure out.)
Let’s see how we can use grafzahl for classifying the imdb data again, which we used in the supervised machine learning session. As a first step, we have to set up the package:
This installs a small version of Python on your systems and sets up R to use it. The cuda argument essentially controls whether grafzahl can access your NVIDIA graphics card. If you do not have one of these cards, you can still use the underlying models, but they will take very long to do anything.
The steps for SML are still the same:
- preprocessing the incoming text
We do not really have to do anything here as transformer models already take care of most steps and the dataset we are using is already quite clean.
- splitting the dataset into training and a test set (which is not included in the model and just used for validation)
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)── Attaching packages ────────────────────────────────────── tidymodels 1.1.0 ──
✔ broom 1.0.4 ✔ rsample 1.1.1
✔ dials 1.2.0 ✔ tune 1.1.1
✔ infer 1.0.4 ✔ workflows 1.1.3
✔ modeldata 1.1.0 ✔ workflowsets 1.0.1
✔ parsnip 1.1.0 ✔ yardstick 1.2.0
✔ recipes 1.0.6
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Search for functions across packages at https://www.tidymodels.org/find/
imdb <- readRDS("data/imdb.rds")
set.seed(1)
split <- initial_split(
data = imdb,
prop = 3 / 4, # the prop is the default, I just wanted to make that visible
strata = label # this makes sure the prevalence of labels is still the same afterwards
)
imdb_train <- training(split)
imdb_test <- testing(split)- fitting (or training) the model
if (!file.exists("data/5_imdb_distilbert.rds")) {
library(grafzahl)
model <- grafzahl(x = imdb_train$text,
y = imdb_train$label,
model_name = "distilbert-base-uncased",
output_dir = "model",
cuda = TRUE,
num_train_epochs = 1,
train_size = 1)
saveRDS(model, "data/5_imdb_distilbert.rds")
} else {
model <- readRDS("data/5_imdb_distilbert.rds")
}If you do not have a graphics card, this step will take a long time. 
- using the test set to compare predictions against the real values for validation
library(gt)
if (!file.exists("data/5_imdb_distilbert_prediction.rds")) {
library(grafzahl)
estimates <- predict(model, newdata = imdb_test$text)
saveRDS(estimates, "data/5_imdb_distilbert_prediction.rds")
} else {
estimates <- readRDS("data/5_imdb_distilbert_prediction.rds")
}
imdb_prediction <- imdb_test |>
bind_cols(estimate = estimates) |>
mutate(truth = factor(label),
estimate = factor(estimate))
my_metrics <- metric_set(accuracy, kap, precision, recall, f_meas)
my_metrics(imdb_prediction, truth = truth, estimate = estimate) |>
gt() |>
data_color(
columns = .estimate,
fn = scales::col_numeric(
palette = c("red", "orange", "green"),
domain = c(0, 1)
)
)| .metric | .estimator | .estimate |
|---|---|---|
| accuracy | binary | 0.8650400 |
| kap | binary | 0.7300800 |
| precision | binary | 0.8774194 |
| recall | binary | 0.8486400 |
| f_meas | binary | 0.8627898 |
Working with Python in R
Why combine Python with R?

Why not just switch to Python?
- If you’re here, you probably already know R so why re-learn things from scratch?
- R is a programming language specifically for statistics with some great built-in functionality that you would miss in Python.
- R has absolutely outstanding packages for data science with no drop-in replacement in Python (e.g., ggplot2, dplyr, tidytext).
Why not just stick with R then?
- Newer models and methods in machine learning are often Python only (as advancements are made by big companies who rely on Python)
- You might want to collaborate with someone who uses Python and need to run their code
- Learning a new (programming) language is always good to extend your skills (also in your the language(s) you already know)
Getting started
We start by installing the necessary Python packages, for which you should use a virtual environment (so we set that one up first).
Create a Virtual Environment
Before you load reticulate for the first time, we need to create a virtual environment. This is a folder in your project directory with a link to Python and you the packages you want to use in this project. Why?
Packages (or their dependencies) on the Python Package Index can be incompatible with each other – meaning you can break things by updating.
Your operating system might keep older versions of some packages around, which you means you could break your OS by and accidental update!
This also adds to projects being reproducible on other systems, as you keep track of the specific version of each package used in your project (you could do this in R with the
renvpackage).
To grab the correct version of Python to link to in virtual environment:
if (R.Version()$os == "mingw32") {
system("where python") # for Windows
} else {
system("whereis python")
}I choose the main Python installation in “/usr/bin/python” and use it as the base for a virtual environment. If you don’t have any Python version on your system, you can install one with reticulate::install_miniconda().
# I added this if condition to not accidentally overwrite the environment when rerunning the notebook
if (!reticulate::virtualenv_exists(envname = "./python-env/")) {
reticulate::virtualenv_create("./python-env/", python = "/usr/bin/python")
# for Windows the path is usually "C:/Users/{user}/AppData/Local/r-miniconda/python.exe"
}
reticulate::virtualenv_exists(envname = "./python-env/")[1] TRUE
reticulate is supposed to automatically pick this up when started, but to make sure, I set the environment variable RETICULATE_PYTHON to the binary of Python in the new environment:
if (R.Version()$os == "mingw32") {
python_path <- file.path(getwd(), "python-env/Scripts/python.exe")
} else {
python_path <- file.path(getwd(), "python-env/bin/python")
}
file.exists(python_path)[1] TRUE
Sys.setenv(RETICULATE_PYTHON = python_path)Optional: make this persist restarts of RStudio by saving the environment variable into an .Renviron file (otherwise the Sys.setenv() line above needs to be in every script):
# open the .Renviron file
usethis::edit_r_environ(scope = "project")
# or directly append it with the necessary line
readr::write_lines(
x = paste0("RETICULATE_PYTHON=", python_path),
file = ".Renviron",
append = TRUE
)Now reticulate should now pick up the correct binary in the project folder:
library(reticulate)
py_config()python: /home/johannes/Documents/Github/aca_vienna/python-env/bin/python
libpython: /usr/lib/libpython3.11.so
pythonhome: /home/johannes/Documents/Github/aca_vienna/python-env:/home/johannes/Documents/Github/aca_vienna/python-env
version: 3.11.3 (main, Jun 5 2023, 09:32:32) [GCC 13.1.1 20230429]
numpy: /home/johannes/Documents/Github/aca_vienna/python-env/lib/python3.11/site-packages/numpy
numpy_version: 1.24.4
NOTE: Python version was forced by RETICULATE_PYTHON
Installing Packages
reticulate::py_install() installs package similar to install.packages(). Let’s install the packages we need:
reticulate::py_install(c(
"scikit-learn<1.3.0",
"bertopic==0.14.1", # this one requires some build tools not usually available on Windows, comment out to install the rest
"sentence_transformers",
"simpletransformers"
))Recreating grafzahl from Python
(if you do not have an NVIDIA graphics card, following these steps locally does not make a lot of sense. You can instead run this on this Google colab I created.) To demonstrate the workflow for reticulate, we do the same analysis as above, but rely on Python functions:
import pandas as pd
import os
import torch
from simpletransformers.classification import ClassificationModel
# args copied from grafzahl, learn more at https://simpletransformers.ai/docs/usage/
model_args = {
"num_train_epochs": 1, # increase for multiple runs, which can yield better performance
"use_multiprocessing": False,
"use_multiprocessing_for_evaluation": False,
"overwrite_output_dir": True,
"reprocess_input_data": True,
"overwrite_output_dir": True,
"fp16": True,
"save_steps": -1,
"save_eval_checkpoints": False,
"save_model_every_epoch": False,
"silent": True,
}
os.environ["TOKENIZERS_PARALLELISM"] = "false"
roberta_model = ClassificationModel(model_type="roberta",
model_name="roberta-base",
# Use GPU if available
use_cuda=torch.cuda.is_available(),
args=model_args)We have constructed a training and test set from the movie review corpus in R above. Now we can train the model on the coded training set and predict the classes for the test set (if you do not have a GPU, this will take a long time, so maybe do it after the course):
# process data to the form simpletransformers needs
train_df = r.imdb_train
train_df['labels'] = train_df['label'].astype('category').cat.codes
train_df = train_df[['text', 'labels']]
roberta_model.train_model(train_df)
# test data needs to be a listtest_l = r.imdb_test["text"].tolist()
predictions, raw_outputs = roberta_model.predict(test_l)imdb_prediction <- imdb_test |>
bind_cols(estimate = factor(c("neg", "pos"))[py$predictions + 1]) |>
mutate(truth = factor(label))
saveRDS(imdb_prediction, "data/5_imdb_roberta.rds")
# imdb_prediction <- readRDS("data/5_imdb_roberta.rds")
my_metrics <- metric_set(accuracy, kap, precision, recall, f_meas)
my_metrics(imdb_prediction, truth = truth, estimate = estimate) |>
gt() |>
data_color(
columns = .estimate,
fn = scales::col_numeric(
palette = c("red", "orange", "green"),
domain = c(0, 1)
)
)| .metric | .estimator | .estimate |
|---|---|---|
| accuracy | binary | 0.8992000 |
| kap | binary | 0.7984000 |
| precision | binary | 0.9082134 |
| recall | binary | 0.8881600 |
| f_meas | binary | 0.8980747 |
Running unsupervised learning with BERTopic
I use the data_corpus_guardian from quanteda.corpora show an example workflow for BERTopic. This dataset contains Guardian newspaper articles in politics, economy, society and international sections from 2012 to 2016.
library(quanteda.corpora)
corp_news <- download("data_corpus_guardian")[["documents"]]
corp_news_texts <- corp_news$textsfrom bertopic import BERTopic/home/johannes/Documents/Github/aca_vienna/python-env/lib/python3.11/site-packages/umap/distances.py:1063: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit()
/home/johannes/Documents/Github/aca_vienna/python-env/lib/python3.11/site-packages/umap/distances.py:1071: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit()
/home/johannes/Documents/Github/aca_vienna/python-env/lib/python3.11/site-packages/umap/distances.py:1086: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit()
/home/johannes/Documents/Github/aca_vienna/python-env/lib/python3.11/site-packages/umap/umap_.py:660: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit()
from sentence_transformers import SentenceTransformer
from umap import UMAP
# To make this example reproducible
umap_model = UMAP(n_neighbors=15, n_components=5,
min_dist=0.0, metric='cosine', random_state=42)
# confusingly, this is the setup part
topic_model = BERTopic(language="english",
top_n_words=5,
n_gram_range=(1, 2),
nr_topics="auto", # change if you want a specific nr of topics
calculate_probabilities=True,
umap_model=umap_model)
# and only here we actually run something
topics, doc_topic = topic_model.fit_transform(r.corp_news.texts)
# save the model
topic_model.save("data/5._bertopic")
# topic_model=BERTopic.load("data/5._bertopic")Unlike traditional topic models, BERTopic uses an algorithm that automatically determines a sensible number of topics and also automatically labels topics:
topic_model <- py$topic_model
topic_labels <- tibble(topic = as.integer(names(topic_model$topic_labels_)),
label = unlist(topic_model$topic_labels_ )) |>
mutate(label = fct_reorder(label, topic))
topic_labels# A tibble: 93 × 2
topic label
<int> <fct>
1 -1 -1_the_to_of_and
2 0 0_the_to_of_and
3 1 1_trump_clinton_the_in
4 2 2_her_she_was_he
5 3 3_bank_the_the bank_to
6 4 4_police_officers_the_was
7 5 5_housing_property_homes_the
8 6 6_nhs_care_the nhs_health
9 7 7_climate_climate change_the_and
10 8 8_the_tax_pay_to
# ℹ 83 more rows
Note that -1 describes a trash topic with words and documents that do not really belong anywhere. BERTopic also supplies the top words, i.e., the ones that most likely belong to each topic. In the code above I requested 5 words for each topic:
top_words <- map_df(names(topic_model$topic_representations_), function(t) {
map_df(topic_model$topic_representations_[[t]], function(y)
tibble(feature = y[[1]], prob = y[[2]])) |>
mutate(topic = as.integer(t), .before = 1L)
})We can plot them in the same way as in the last session:
library(tidytext)
top_words |>
filter(topic %in% c(1, 7, 44, 53, 65, 66)) |> # select a couple of topics
left_join(topic_labels, by = "topic") |>
mutate(feature = reorder_within(feature, prob, topic)) |>
ggplot(aes(x = prob, y = feature, fill = topic, label = label)) +
geom_col(show.legend = FALSE) +
facet_wrap(vars(label), ncol = 2, scales = "free_y") +
scale_y_reordered() +
labs(x = NULL, y = NULL)
We can use a nice little visualization built into BERTopic to show how topics are linked to one another:
# map intertopic distance
intertopic_distance = topic_model.visualize_topics(width=700, height=700)
# save fig
intertopic_distance.write_html("media/bert_corp_news_intertopic.html")htmltools::includeHTML("media/bert_corp_news_intertopic.html")BERTopic also classifies documents into the topic categories (again not really how you should use LDA topicmodels). And provides a nice visualisation for trends over time. Unfortunately, the date format in R does not translate automagically to Python, hence we need to convert the dates to strings:
corp_news_t <- corp_news |>
mutate(date_chr = as.character(date))topics_over_time = topic_model.topics_over_time(docs=r.corp_news_t.texts,
timestamps=r.corp_news_t.date_chr,
global_tuning=True,
evolution_tuning=True,
nr_bins=20)
#plot figure
fig_overtime = topic_model.visualize_topics_over_time(topics_over_time,
topics=[1, 7, 44, 53, 65, 66])
#save figure
fig_overtime.write_html("media/fig_overtime.html")htmltools::includeHTML("media/fig_overtime.html")wrap up
sessionInfo()R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: EndeavourOS
Matrix products: default
BLAS: /usr/lib/libblas.so.3.11.0
LAPACK: /usr/lib/liblapack.so.3.11.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Berlin
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidytext_0.4.1 quanteda.corpora_0.9.2 reticulate_1.28
[4] gt_0.9.0 yardstick_1.2.0 workflowsets_1.0.1
[7] workflows_1.1.3 tune_1.1.1 rsample_1.1.1
[10] recipes_1.0.6 parsnip_1.1.0 modeldata_1.1.0
[13] infer_1.0.4 dials_1.2.0 scales_1.2.1
[16] broom_1.0.4 tidymodels_1.1.0 lubridate_1.9.2
[19] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
[22] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[25] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 timeDate_4022.108 farver_2.1.1
[4] fastmap_1.1.1 janeaustenr_1.0.0 digest_0.6.32
[7] rpart_4.1.19 timechange_0.2.0 lifecycle_1.0.3
[10] tokenizers_0.3.0 survival_3.5-5 magrittr_2.0.3
[13] compiler_4.3.1 sass_0.4.5 rlang_1.1.1
[16] tools_4.3.1 utf8_1.2.3 yaml_2.3.7
[19] data.table_1.14.8 knitr_1.42 labeling_0.4.2
[22] htmlwidgets_1.6.2 xml2_1.3.4 DiceDesign_1.9
[25] withr_2.5.0 nnet_7.3-19 grid_4.3.1
[28] fansi_1.0.4 colorspace_2.1-0 future_1.32.0
[31] globals_0.16.2 iterators_1.0.14 MASS_7.3-60
[34] cli_3.6.1 rmarkdown_2.21 generics_0.1.3
[37] rstudioapi_0.14 future.apply_1.10.0 tzdb_0.4.0
[40] splines_4.3.1 parallel_4.3.1 vctrs_0.6.3
[43] hardhat_1.3.0 Matrix_1.5-4.1 jsonlite_1.8.7
[46] hms_1.1.3 listenv_0.9.0 foreach_1.5.2
[49] gower_1.0.1 glue_1.6.2 parallelly_1.35.0
[52] codetools_0.2-19 stringi_1.7.12 gtable_0.3.3
[55] munsell_0.5.0 GPfit_1.0-8 pillar_1.9.0
[58] furrr_0.3.1 htmltools_0.5.5 ipred_0.9-14
[61] lava_1.7.2.1 R6_2.5.1 lhs_1.1.6
[64] evaluate_0.20 lattice_0.21-8 SnowballC_0.7.1
[67] png_0.1-8 backports_1.4.1 class_7.3-22
[70] Rcpp_1.0.10 prodlim_2023.03.31 xfun_0.39
[73] pkgconfig_2.0.3
py_list_packages() package version requirement
1 absl-py 1.4.0 absl-py==1.4.0
2 aiohttp 3.8.4 aiohttp==3.8.4
3 aiosignal 1.3.1 aiosignal==1.3.1
4 altair 5.0.1 altair==5.0.1
5 appdirs 1.4.4 appdirs==1.4.4
6 async-timeout 4.0.2 async-timeout==4.0.2
7 attrs 23.1.0 attrs==23.1.0
8 bertopic 0.14.1 bertopic==0.14.1
9 blinker 1.6.2 blinker==1.6.2
10 cachetools 5.3.1 cachetools==5.3.1
11 certifi 2023.5.7 certifi==2023.5.7
12 charset-normalizer 3.1.0 charset-normalizer==3.1.0
13 click 8.1.3 click==8.1.3
14 cmake 3.26.4 cmake==3.26.4
15 Cython 0.29.35 Cython==0.29.35
16 datasets 2.13.1 datasets==2.13.1
17 decorator 5.1.1 decorator==5.1.1
18 dill 0.3.6 dill==0.3.6
19 docker-pycreds 0.4.0 docker-pycreds==0.4.0
20 filelock 3.12.2 filelock==3.12.2
21 frozenlist 1.3.3 frozenlist==1.3.3
22 fsspec 2023.6.0 fsspec==2023.6.0
23 gitdb 4.0.10 gitdb==4.0.10
24 GitPython 3.1.31 GitPython==3.1.31
25 google-auth 2.21.0 google-auth==2.21.0
26 google-auth-oauthlib 1.0.0 google-auth-oauthlib==1.0.0
27 grpcio 1.56.0 grpcio==1.56.0
28 hdbscan 0.8.29 hdbscan==0.8.29
29 huggingface-hub 0.15.1 huggingface-hub==0.15.1
30 idna 3.4 idna==3.4
31 importlib-metadata 6.7.0 importlib-metadata==6.7.0
32 Jinja2 3.1.2 Jinja2==3.1.2
33 joblib 1.3.1 joblib==1.3.1
34 jsonschema 4.17.3 jsonschema==4.17.3
35 lit 16.0.6 lit==16.0.6
36 llvmlite 0.40.1 llvmlite==0.40.1
37 Markdown 3.4.3 Markdown==3.4.3
38 markdown-it-py 3.0.0 markdown-it-py==3.0.0
39 MarkupSafe 2.1.3 MarkupSafe==2.1.3
40 mdurl 0.1.2 mdurl==0.1.2
41 mpmath 1.3.0 mpmath==1.3.0
42 multidict 6.0.4 multidict==6.0.4
43 multiprocess 0.70.14 multiprocess==0.70.14
44 networkx 3.1 networkx==3.1
45 nltk 3.8.1 nltk==3.8.1
46 numba 0.57.1 numba==0.57.1
47 numpy 1.24.4 numpy==1.24.4
48 nvidia-cublas-cu11 11.10.3.66 nvidia-cublas-cu11==11.10.3.66
49 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-cupti-cu11==11.7.101
50 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-nvrtc-cu11==11.7.99
51 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cuda-runtime-cu11==11.7.99
52 nvidia-cudnn-cu11 8.5.0.96 nvidia-cudnn-cu11==8.5.0.96
53 nvidia-cufft-cu11 10.9.0.58 nvidia-cufft-cu11==10.9.0.58
54 nvidia-curand-cu11 10.2.10.91 nvidia-curand-cu11==10.2.10.91
55 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusolver-cu11==11.4.0.1
56 nvidia-cusparse-cu11 11.7.4.91 nvidia-cusparse-cu11==11.7.4.91
57 nvidia-nccl-cu11 2.14.3 nvidia-nccl-cu11==2.14.3
58 nvidia-nvtx-cu11 11.7.91 nvidia-nvtx-cu11==11.7.91
59 oauthlib 3.2.2 oauthlib==3.2.2
60 packaging 23.1 packaging==23.1
61 pandas 2.0.3 pandas==2.0.3
62 pathtools 0.1.2 pathtools==0.1.2
63 Pillow 9.5.0 Pillow==9.5.0
64 plotly 5.15.0 plotly==5.15.0
65 protobuf 4.23.3 protobuf==4.23.3
66 psutil 5.9.5 psutil==5.9.5
67 pyarrow 12.0.1 pyarrow==12.0.1
68 pyasn1 0.5.0 pyasn1==0.5.0
69 pyasn1-modules 0.3.0 pyasn1-modules==0.3.0
70 pydeck 0.8.1b0 pydeck==0.8.1b0
71 Pygments 2.15.1 Pygments==2.15.1
72 Pympler 1.0.1 Pympler==1.0.1
73 pynndescent 0.5.10 pynndescent==0.5.10
74 pyrsistent 0.19.3 pyrsistent==0.19.3
75 python-dateutil 2.8.2 python-dateutil==2.8.2
76 pytz 2023.3 pytz==2023.3
77 pytz-deprecation-shim 0.1.0.post0 pytz-deprecation-shim==0.1.0.post0
78 PyYAML 6.0 PyYAML==6.0
79 regex 2023.6.3 regex==2023.6.3
80 requests 2.31.0 requests==2.31.0
81 requests-oauthlib 1.3.1 requests-oauthlib==1.3.1
82 rich 13.4.2 rich==13.4.2
83 rsa 4.9 rsa==4.9
84 safetensors 0.3.1 safetensors==0.3.1
85 scikit-learn 1.2.2 scikit-learn==1.2.2
86 scipy 1.11.1 scipy==1.11.1
87 sentence-transformers 2.2.2 sentence-transformers==2.2.2
88 sentencepiece 0.1.99 sentencepiece==0.1.99
89 sentry-sdk 1.26.0 sentry-sdk==1.26.0
90 seqeval 1.2.2 seqeval==1.2.2
91 setproctitle 1.3.2 setproctitle==1.3.2
92 simpletransformers 0.63.11 simpletransformers==0.63.11
93 six 1.16.0 six==1.16.0
94 smmap 5.0.0 smmap==5.0.0
95 streamlit 1.24.0 streamlit==1.24.0
96 sympy 1.12 sympy==1.12
97 tenacity 8.2.2 tenacity==8.2.2
98 tensorboard 2.13.0 tensorboard==2.13.0
99 tensorboard-data-server 0.7.1 tensorboard-data-server==0.7.1
100 threadpoolctl 3.1.0 threadpoolctl==3.1.0
101 tokenizers 0.13.3 tokenizers==0.13.3
102 toml 0.10.2 toml==0.10.2
103 toolz 0.12.0 toolz==0.12.0
104 torch 2.0.1 torch==2.0.1
105 torchvision 0.15.2 torchvision==0.15.2
106 tornado 6.3.2 tornado==6.3.2
107 tqdm 4.65.0 tqdm==4.65.0
108 transformers 4.30.2 transformers==4.30.2
109 triton 2.0.0 triton==2.0.0
110 typing_extensions 4.7.1 typing_extensions==4.7.1
111 tzdata 2023.3 tzdata==2023.3
112 tzlocal 4.3.1 tzlocal==4.3.1
113 umap-learn 0.5.3 umap-learn==0.5.3
114 urllib3 1.26.16 urllib3==1.26.16
115 validators 0.20.0 validators==0.20.0
116 wandb 0.15.4 wandb==0.15.4
117 watchdog 3.0.0 watchdog==3.0.0
118 Werkzeug 2.3.6 Werkzeug==2.3.6
119 xxhash 3.2.0 xxhash==3.2.0
120 yarl 1.9.2 yarl==1.9.2
121 zipp 3.15.0 zipp==3.15.0